epsilon 2
Reviews: Stochastic Continuous Greedy ++: When Upper and Lower Bounds Match
This paper considers DR-submodular maximization under convex constraints. It achieves optimal results in terms of both approximation and query complexity. The first paper in this line of work shows that Stochastic Gradient Ascent (SGA) achieves a 1/2-epsilon approximation in 1/epsilon 2 stochastic gradient. There have been multiple papers on this problem since then. This paper introduces SCG which achieves a 1-1/e-epsilon in 1/epsilon 2 stochastic oracle calls using a novel variance reduction method.
Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming
Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean k -means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the n input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS's for the k -means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and k -means clustering.
More data speeds up training time in learning halfspaces over sparse vectors
Daniely, Amit, Linial, Nati, Shalev-Shwartz, Shai
The increased availability of data in recent years led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em natural supervised learning problem} --- we consider agnostic PAC learning of halfspaces over $3$-sparse vectors in $\{-1,1,0\} n$. This class is inefficiently learnable using $O\left(n/\epsilon 2\right)$ examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, efficiently learning this class using $O\left(n/\epsilon 2\right)$ examples is impossible.
Empirical Bayes for multiple sample sizes · The File Drawer
Here's a data problem I encounter all the time. Let's say I'm running a website where users can submit movie ratings on a continuous 1-10 scale. For the sake of argument, let's say that the users who rate each movie are an unbiased random sample from the population of users. I'd like to compute the average rating for each movie so that I can create a ranked list of the best movies. I've got two big problems here. First, nobody is using my website.
- Media > Film (0.35)
- Leisure & Entertainment (0.35)